New disease loci 



EDITOR'S 
CHOICE 



► Additional materials are 
published online only. To view 
these files please visit the 
journal online (http://jmg.bmj. 
com/content/49/6.toc). 

'Center for Human Genome 
Variation and Department of 
Medicine, Duke University 
School of Medicine, Durham, 
North Carolina, USA 
department of Pediatrics, 
Section of Medical Genetics, 
Duke University, Durham, North 
Carolina, USA 
3 Department of Human 
Genetics, University of 
Michigan, Ann Arbor, Michigan, 
USA 

4 Department of Molecular 
Genetics and Microbiology, 
Duke University School of 
Medicine, Durham, North 
Carolina, USA 



Correspondence to 

Dr David Goldstein, Center for 
Human Genome Variation, Duke 
University School of Medicine, 
Box 91009, Durham, NC 27708, 
USA; d.goldstein@duke.edu 



AN and VS contributed equally 
to this work. 



Received 10 February 2012 
Revised 14 March 2012 
Accepted 2 April 2012 
Published Online First 
11 May 2012 




ME) 



This paper is freely available 
online under the BMJ Journals 
unlocked scheme, see http:// 
jmg.bmj.com/site/about/ 
unlocked.xhtml 



ORIGINAL ARTICLE 

Clinical application of exome sequencing in 
undiagnosed genetic conditions 

Anna C Need, 1 Vandana Shashi, 2 Yuki Hitomi, 1 Kelly Schoch, 2 

Kevin V Shianna, 1 Marie T McDonald, 2 Miriam H Meisler, 3 David B Goldstein 1 ' 4 



ABSTRACT 

Background There is considerable interest in the use of 
next-generation sequencing to help diagnose unidentified 
genetic conditions, but it is difficult to predict the 
success rate in a clinical setting that includes patients 
with a broad range of phenotypic presentations. 
Methods The authors present a pilot programme of 
whole-exome sequencing on 12 patients with 
unexplained and apparent genetic conditions, along with 
their unaffected parents. Unlike many previous studies, 
the authors did not seek patients with similar 
phenotypes, but rather enrolled any undiagnosed 
proband with an apparent genetic condition when 
predetermined criteria were met. 
Results This undertaking resulted in a likely genetic 
diagnosis in 6 of the 12 probands, including the 
identification of apparently causal mutations in four 
genes known to cause Mendelian disease 
[TCF4, EFTUD2, SCN2A and SMAD4) and one gene 
related to known Mendelian disease genes (NGLY1). 
Of particular interest is that at the time of this study, 
EFTUD2 was not yet known as a Mendelian disease 
gene but was nominated as a likely cause based on the 
observation of de novo mutations in two unrelated 
probands. In a seventh case with multiple disparate 
clinical features, the authors were able to identify 
homozygous mutations in EFEMP1 as a likely cause for 
macular degeneration (though likely not for other 
features). 

Conclusions This study provides evidence that 
next-generation sequencing can have high success rates 
in a clinical setting, but also highlights key challenges. 
It further suggests that the presentation of known 
Mendelian conditions may be considerably broader than 
currently recognised. 



INTRODUCTION 

Whole-genome and whole-exome sequencing have 
proven remarkably successful in identifying the 
causes of Mendelian diseases. These analyses have 
generally depended on the availability of more than 
one unrelated affected individual and/or linkage 
evidence in at least one family. However, next- 
generation sequencing (NGS) has also succeeded in 
identifying causes of genetic conditions even when 
they are seen in only a single patient. 1-3 

Consequently there is growing interest in the 
introduction of NGS into the clinic to aid in the 
diagnosis of conditions for which no genetic cause 
can be found with targeted testing or chromosomal 
arrays. However, in a clinical setting, patients with 



undiagnosed genetic conditions tend to present 
with a wide range of clinical features, and it is often 
necessary to consider each patient's genome indi- 
vidually, rather than looking for common disrupted 
genes in multiple cases with a similar phenotype. It 
is not clear what success rate NGS approaches will 
achieve in providing genetic diagnoses in this more 
challenging setting. In this study, we have evalu- 
ated the use of NGS to provide genetic diagnoses 
using 12 parent-child trios in which the child had 
congenital anomalies and/or intellectual disabilities 
due to unexplained conditions presumed to be 
genetic. Importantly, the patients were chosen to 
be representative of a clinical sample of undiag- 
nosed genetic conditions, in that they were not 
selected for genetic tractability or phenotypic 
homogeneity. 

METHODS 

Exome sequencing was performed on each patient 
and both parents using the Illumina HiSeq2000 
platform and the Agilent SureSelect Human All 
Exon 50Mb Kit. Detailed methods for laboratory 
work can be found in the online supplementary 
methods. 

Study population 

The research protocol was approved by the Duke 
Institutional Review Board, and all human partici- 
pants or their guardians gave written informed 
consent. Twelve families (child, mother and father) 
were recruited through the genetics clinic at Duke 
University Medical Center based on whether their 
child met two or more of the following criteria: 
(1) unexplained intellectual disability and/or devel- 
opmental delay; (2) one major congenital anomaly; 
(3) 2—3 minor congenital anomalies; and (4) facial 
dysmorphisms. In addition, the families were 
required to meet the following eligibility require- 
ments: (1) both biological parents available for 
testing; (2) previous clinically indicated genetic 
testing, including a chromosomal microarray 
(Affymetrix 6.0, http://www.affymetrix.com), 
had been normal; and (3) no evidence of effects of 
teratogens, birth asphyxia or non-accidental trauma. 
Subjects were not eligible if the mother was preg- 
nant at the time of enrolment. Finally results were 
only returned to patients and/or patient families 
following confirmation of detected variants in 
a CLIA certified laboratory. Controls were subjects 
enrolled in Center for Human Genome Variation 
studies through Duke Institutional Review Board 
approved protocols (n=830). 



J Med Genet 201 2:49:353-361 . doi:1 0.1 1 36/jmedgenet-201 2-1 0081 9 



353 



New disease loci 



Identification of potentially causal variants 

Sequence Variant Analyser (SVA) 4 (http://www.svaproject.org/) 
was used to identify variants of interest using standard filtering 
criteria, (Single nucleotide variant (SNV) quality SNV consensus 
score, insertion-deletion (INDEL) consensus score >20, INDEL 
quality >50, number of reads supporting SNV or INDEL >3). 
We designed screens to identify highly penetrant genotypes that 
might account for each child's conditions, and prioritised vari- 
ants as follows: (1) homozygous (including hemizygous X 
variants) in the proband and never homozygous in the controls 
(recessive and X-linked variants); (2) heterozygous in the 
proband and absent in the parents and controls (putative de 
novo variants); and (3) from genes with two rare (MAF<0.03) 
variants in the proband that were not seen together in the 
parents or in any controls (compound heterozygotes). All vari- 
ants, whether annotated as functional or not, were subjected to 
the screens for homozygous, X-linked and de novo candidates, 
the screen for compound heterozygous variants was limited to 
missense and nonsense SNVs, and frameshift INDELs. Appro- 
priate functional work, where applicable, was performed based 
on the annotated function of the variant (online supplementary 
figure 1). 

Further filtering of variants 

Homozygotes 

We removed any homozygous variant that was present in >3% 
of controls (corresponding to a disease frequency of 1 in 4500 or 
greater). For homozygous variants that were not present in the 
heterozygous form in both parents, we first removed those with 
low coverage (<10 reads), and then examined raw alignments 
for the remainder. In all cases, this was sufficient to resolve 
whether the variant was present in the parent but not called 
(because of <3 reads or poor quality scores), or incorrectly called 
as homozygous in the child. 

De novos 

Parental and proband raw alignments were examined for all 
potential de novo SNVs. The majority was ruled out for one 
of the following reasons: (a) low coverage in parents (<10x); (b) 
variant is visibly present in parental alignments but not identi- 
fied by SAMtools; or (c) alignments look unconvincing (eg, 
multiple mismatches in same read, variant is at the very ends of 
reads) in proband and/or parents. For potential de novo INDELs, 
we removed those with fewer than five variant reads or with 
a variant/reference read ratio <0.3 (the vast majority) before 
inspection of raw alignments. 

Compound heterozygotes 

Raw alignments for all potential compound heterozygous vari- 
ants were inspected in the proband and parents to ensure that 
the contributing variants were each inherited from a different 
parent. 

Communication of results to families 

All families underwent genetic counselling at the time of 
participation. In the initial counselling session, de novo, auto- 
somal-recessive and X-linked inheritance patterns were discussed 
and it was emphasised that autosomal-dominant conditions 
with incomplete penetrance, synergistic heterozygosity, mito- 
chondrial disorders and epigenetic changes would not be 
detected with this approach. All families were aware that 
a variant of interest that may be detected may not be definitely 
proven casual, and also that no results may be obtained. Parents 
were informed that variants of uncertain significance would not 



be reported to them. We debated if we should re-contact families 
after completion of the study in the event that a variant of 
uncertain significance was subsequently thought to be casual 
but it was decided that it was not feasible to offer to do so. 
Variants thought to be causal or reasonably thought to 
contribute to the patient's phenotype were confirmed in a CLIA- 
certified laboratory prior to communication to the families, at 
which time a second genetic counselling session was arranged 
for discussion of results. With the permission of the families, 
the information was then communicated to the child's 
physicians. 

For families wherein there would be no conclusive results, 
the second counselling session would be held after completion 
of the sequence data analyses. It was discussed with families 
that secondary or incidental findings in the child or the parents 
would not be intentionally screened for. If incidentally observed, 
the only variants that would be communicated were those 
within known genes that would result in premature death if 
untreated. Detection of carrier status in the affected child 
would not result in communication of such results. Detection of 
carrier status in the parents for known genetic conditions would 
be communicated to them, although it was emphasised that the 
genomes would not be proactively screened for such variants. 

RESULTS 

Exome sequencing of each trio (table 1) resulted in an average 
coverage at captured regions of 71x (table 2). We used the SVA 
software, 4 followed by manual inspection of candidate variants, 
as described in the online supplementary methods, to screen 
for candidate homozygous X-linked, compound heterozygous 
and de novo variants. The SVA screening produced a list of 260 
candidate de novo SNVs and 364 candidate de novo INDELs, of 
which 18 SNVs (7%) and 2 INDELs (0.5%) were retained as 
high-confidence variants after manual inspection (table 2). Using 
this screening procedure, we found a likely genetic diagnosis in 
six of the families, a likely explanation for one of the clinical 
features in a seventh subject and a number of suggestive muta- 
tions in other families. No secondary (incidental) variants were 
detected in the probands or their parents. 

Likely genetic diagnosis: Trios 1 and 1—EFTUD2 

Depending on how 'functional' mutations are defined, sequencing 
studies suggest an average of about one functional de novo 
mutation per genome. 5 In this study, we see a total of 20 high- 
confidence de novo variants, somewhat higher than reported 
for controls. 6 A particularly striking observation is that of these 
20 de novo putatively functional variants, two were observed in 
the same gene, EFTUDZ, in trio 1 and trio 7. Both variants were 
confirmed as de novo with Sanger sequencing. Very approxi- 
mately, assuming (incorrectly) that each gene of the approxi- 
mately 22 000 captured is equally likely to harbour a de novo 
mutation, the likelihood of seeing the same gene affected by 
chance in 2 of 20 de novos is 0.0086, suggesting the possibility 
of involvement of EFTUDZ in these patients' conditions. The 
patients share some clinical features (table 3), although they 
were not originally considered to be similar. 

The variant in trio 1 is a G/A transition located at the +5 
position in the splice donor site of exon 11. G>A mutations 
of the +5G have been observed in several human inherited 
disorders, 7-9 and in some studied examples site-directed muta- 
genesis of the +5G results in reduced splicing efficiency. 10-12 
Investigation of the mRNA isolated from blood of the proband 
and parents did not detect altered splicing or expression level, 
but tissue-specific impaired splicing remains a possibility. 
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Table 1 Demographic and clinical features of sequenced patients 



Genetic tests performed clinically 

Trio Sex Age Race Symptoms before enrolment in study 



1 


M 


8 


Indian 


Developmental delay, possible autism, microcephaly, 
dysmorphic features, spine abnormalities, sensorineural 
hearing loss 


Chromosome microarray (paternally inherited 15q13.3 
dup), Fragile X 


2 


M 


3 


European-American 


Developmental delay, multifocal epilepsy, involuntary 
movements, abnormal liver function, absent tears 


Chromosomes, chromosome microarray, Niemann-Pick 
type C, hepatocerebral mDNA depletion panel (P0LG1, 
DGUOK, MPV17), ataxia with oculomotor apraxia type 2 
(SETX), Allgrove Syndrome, ataxia telangectasia (ATM), 
Rett (MECP2), alpha— 1 antitrypsin (AAT), congenital 
disorder of glycosylation (transferrin isoelectric focusing 
and N-glycan analysis), metabolic tests (Tay Sachs, 
Sandhoff, mannosidosis, mucolipidosis II, Krabbe, 

mots phm matin loi itnHwctrnnhw aHronnloi iknHwctrnnhw 
1 1 icLau ii ui i la liu icuixuu y oil ulji ly , aui ci luicUKuuy oil ulji ly , 

GAMT, plasma amino acids, plasma acylcarnitine, urine 
organic acids). 


3 


M 


3 


European-American 


Developmental delay, autism, coarctation of the aorta, 

tothoroH rnrri cnnnomtal nwctanmnc and ctrahicmnc 
LcLIIClcU L.UIU, UUI ILjGl II Lai 1 [yoLayi 1 IUO ullU oil CilJIol 1 lUo 


Chromosome microarray (maternally inherited 15q26.3 

riolotinn^ *5mith-l omli-f1nit7 Aarcknn 

UClCLIUI 1 f , Ol Mill I LBI 1 III VJUILi, Mai OMJLJ 


4 


F 


adult 


European-American 


multiple congenital abnormalities and macular degeneration 


Chromosome microarray (2 stretches of loss of 
heterozygosity on chromosome 2), Fragile X 
(premutation carrier) 


5 


F 


12 


European-American 


Severe intellectual disability, autism, bilateral hyperpronated 
feet, facial dysmorphisms 


Chromosomes, chromosome microarray, Rett, Angelman 
methlyation, Fragile X, Cohen Syndrome 


6 


M 


18 


European-American 


Intellectual disability, epilepsy, panhypopituitarism, 
hypertension, bifid great toe, vertebral segmentation 
anomalies and sagittal cleft of the vertebra, hypoplastic 13th 
rib, and delayed bone age 


Chromosomes, chromosome microarray, Borgeson- 
Forssman-Lehman syndrome 


7 


M 


2 


European-American 


Microcephaly, facial asymmetry, acyanotic Tetralogy of 
Fallot; history of small muscular ventricular septal defect; 
right aortic arch with mirror image branching; malformed 
right ear with hearing loss, bifid uvula, cleft soft palate 


Chromosome microarray, CHARGE (CHD7) 


8 


M 


16 


European-American 


Severe intellectual disability, dysmorphic features evident, 
bicuspid aortic valve, bilateral coronal craniosynostoses, 
quadriplegic cerebral palsy, bilateral inguinal hernias, G-tube 
placement and obstructive sleep apnoea 


Chromosome microarray, craniosynostosis syndromes 
(FGFR2), non-syndromic craniosynostosis (FGFR3) 
Saethre-Chotzen syndrome (TWIST) 


9 


F 


4 


Algerian 


Developmental delay, bilateral congenital cataracts and 
strabismus, ventricular and atrial septal defects, a unilateral 
clubfoot, and unilateral choanal atresia 


Chromosome microarray (Long stretch of loss of 
heterozygosity on chromosome 17), CHARGE (CHD7), 
PAX6, 7-dehydrocholesterol and cholesterol levels 


10 


M 


11 


European- American 


Attention deficit hyperactivity disorder, language delays, 
coarse facial features, bilateral mandibular cysts, low muscle 
tone 


Chromosome microarray, Costello (H-RAS), Gorlin 
(PTCH), Comprehensive Noonan sequencing array 
(BRAF, HRAS, KRAS MAPT2K1, MAPTK2, PTPN11, 
RAF1, SH0C2 and S0S1), MPS panel 


11 


M 


9 


European-American 


Severe intellectual disability, developmental delay, seizures/ 
infantile spasms, hypotonia and minor dysmorphisms 


Chromosomes, chromosome microarray (familial Xpl 1 .4 
duplication), acylcarnitine profile, plasma amino acids, 
urine organic acids, creatine/guanidinoacetate analysis in 
urine and blood 


12 


F 


4 


European-American 


Speech delay, borderline microcephaly, failure to thrive, 
dysplastic nails, ventricular septal defect and hip dysplasia 


Chromosomes, chromosome microarray 



Documentation of a functional effect on splicing will be 
required to confirm pathogenicity of this variant. The EFTUD2 
variant in trio 7 is a frameshift INDEL causing the premature 
termination of the protein at the end of exon 9 (residue 222/ 
962). This study thus identified EFTUDZ as a leading candidate 
for explaining the conditions in these children. Subsequent to 
this work, Lines and colleagues 13 very recently reported an 
analysis of 12 patients with Mandibulofacial Dysostosis with 
microcephaly, and found that all have de novo mutations in 
EFTUDZ. On examination, both these patients show similarities 
to the children in this report, and the patient from trio seven 
fits the condition very closely. 

Trio 2: NGLY1 

Screening for compound heterozygous variants revealed that 
patient 2 had inherited a frameshift variant in the last exon of 
NGLY1 from his mother, and a nonsense mutation in exon 8 
from his father. NGLY-1 encodes N-glycanase 1, which is involved 
in the degradation of misfolded glycoproteins. N-glycanase 1 has 
not been associated with a specific disorder, but the phenotype 
of this child is consistent with a congenital disorder of glyco- 



sylation (table 1), and transferring isoelectric focusing and 
N-glycan analyses have been normal on repeated testing. To 
further explore the effect of these variants, we compared NGLY-I 
protein expression in leucocytes extracted from blood from the 
patient, his parents and three controls. Both parents showed 
reduced expression compared with controls, and the patient had 
barely discernible levels of NGLY1 (figure 1). Dysfunction of 
NGLY4 would be expected to result in abnormal accumulation 
of misfolded glycoproteins due to impaired degradation. In our 
patient, liver biopsy showed an amorphous unidentified substance 
throughout the cytoplasm, suggestive of stored material in the 
liver cells. It is to be noted that extensive testing for lysosomal 
storage had also been pursued in this child, and all the results had 
been normal. Further cellular assays are underway to better 
characterise this mutation. 

Trio 3: SMAD4 

A de novo non-synonymous mutation was identified in SMAD4 
in trio 3, resulting in an isoleucine to valine substitution at 
amino acid position 500 (I500V). This variant has recently been 
reported to be the causal variant in approximately half of all 
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Table 2 Exome sequencing quality and summary of rare homozygous and de novo variants 




Coverage proband 


Coverage mother 


Coverage father 








de novos 






1 no 


Captured 
regions with 
coverage 
>10 (%) 


Average 
coverage 
captured 
regions 


Captured 
regions with 
coverage 
>10 (%) 


Average 
coverage 
captured 
regions 


Captured 
regions with 
coverage 
>10 (%) 


Average 
coverage 
captured 
regions 


Rare 
homo 


A-linked 


Cpd 

hets 


Confirmed 
by Sanger 
sequencing/ 
laqlvlan 


Not confirmed 
but high 
coverage, 
good quality 
alignment 


Variants 
of interest 


1 


89.8 


115 


90.19 


64 


90.59 


65 


25 


17 


0 


EFTUD2 i-e 


ZPF90 NS, 
TMEM1 75 S 


EFTUD2 


2 


88.9 


63 


87.5 


60 


88.37 


58 


6 


9 


2 


none 




NGLY1 


3 


88.2 


58 


85.2 


48 


86.29 


45 


9 


6 


0 


SMAD4 NS 


NTSR1 NS, 
AC1 21 493.1 
NS 


SMAD4 


4 


91.5 


80 


88.34 


72 


90.85 


73 


15 


NA 


0 




ATBAP2 NS 


EFEMP1 


5 


90.1 


64 


90.67 


71 


91.02 


67 


7 


NA 


1 


TCFA NS 


RBM43 NS 


TCFA 


6 


88.9 


60 


87.18 


102 


89.07 


61 


16 


6 


0 


HNRNPU 
ESS 


SMAD1, NS 


HNRNPU, 
SMAD1 


7 


87.3 


68 


89.75 


87 


90.33 


94 


7 


8 


0 


EFTUD2 
FS 




EFTUD2 


8 


85.9 


55 


89.72 


81 


92.39 


106 


3 


4 


0 




None 


9 


89.3 


57 


80.68 


100 


85.63 


60 


36 


NA 


0 




ZNF266 S, 
C120R51 NS, 
SAMDU FS 


None 


10 


90.8 


72 


90.59 


68 


78.37 


83 


6 


5 


0 




MAST1 NS 




11 


91.6 


88 


89.54 


67 


89.22 


71 


8 


5 


1 


SCN2A NS 


TBC1D1 NS 


SCN2A 


12 


91.0 


77 


90.61 


68 


90.33 


71 


4 


NA 


0 




NR1H3 NS, 
AP4M1 in 


None 



1=percentage of captured regions with coverage>5; 2=average coverage captured regions (x); NS, non-synonymous, S, synonymous; i-e, intron-exon boundary; in, intronic variant; FS, 
frameshift variant; ESS, change in essential splice site. 



cases of Myhre syndrome, a clinically heterogeneous and rare 
developmental disorder. All other cases in these reports were 
caused by substitutions at the same position, including ile500thr 
and ile500met. Myrhe syndrome is characterised by variable 
short stature, short hands and feet, facial dysmorphisms, 
muscular hypertrophy, skin thickening, joint limitation, deaf- 
ness and cognitive delay. 14-16 Our patient did not present as 
a typical case. Although he has hearing loss, cognitive impair- 
ment and some of the characteristic facial dysmorphisms as 
well as ocular anomalies and congenital heart defects, he lacks 
some key diagnostic features including short stature, muscular 
hypertrophy, joint limitation, skin thickening and skeletal 
abnormalities. However, he is much younger than most reported 
patients, and it is possible that some manifestations such as 
joint stiffness, muscular hypertrophy and the skin thickening 
may emerge later. He also has scoliosis, which has not previously 
been described as a feature of Myhre syndrome. This case 
illustrates that with NGS, more early diagnoses and detection 
of patients with atypical presentations of Mendelian disorders 



Table 3 Clinical features of the two patients with EFTUD2 mutations, 
demonstrating similarities and dissimilarities between the two 





Case 1 


Case 7 


Developmental delay 


Yes 


Yes 


Microcephaly 


Yes 


Yes 


Vertebral anomalies 


Yes - fusion of C2 to 
C5 vertebrae 


None 


Hearing loss 


Yes - sensorineural hearing 


Hearing loss on 




loss on both sides 


the right side 


Auricles 


Abnormal 


Abnormal 


Limbs 


Hypoplastic right thumb/limited 
flexion and extension of the right 
interphalangeal thumb joint 


Normal 


Palate 


Normal 


Soft cleft palate 


Cardiac anomaly 


None 


Tetralogy of Fallot 


Facial asymmetry 


None 


Yes 



would occur, resulting in widening of the phenotypic spectrum 
of these disorders. 

Trio 5: TCF4 

A novel de novo mutation was found in TCF4, a gene known to 
carry mutations responsible for Pitt-Hopkins syndrome (PHS). 
Sanger sequencing confirmed that the mutation is de novo, and 
a TaqMan assay in 1298 controls found no other carriers. We 
then evaluated the mRNA of the trio and found that the variant 
destroys the 3 splice site of exon 9 (655 G>A, D219N), resulting 
in the incorporation of 37 incorrect amino acids before intro- 
duction of a stop codon and premature termination. Examina- 
tion of protein expression showed that the variant protein was 
completely degraded through the ubiquitin-proteasome system 
(figure 2). This is likely to lead to haploinsufficiency of TCF4, the 
known cause of PHS. 

In retrospect, our patient's features of wide mouth, high 
cheekbones, deep-set eyes, limited speech and severe intellectual 
disabilities, are consistent with a diagnosis of PHS. She lacks 
the characteristic hyperventilation (seen in 86% of reported 
cases) and epilepsy (70%). 17 Due to the absence of both these 
distinctive features, she had not been tested for this disorder, 
although it had been considered in the differential diagnosis. 

Trio 11: SCN2A 

A de novo variant was identified in SCNZA, a neuronal voltage- 
gated sodium channel gene. The mutation was at a site for 
which no previous mutation had been reported. Approximately 
20 de novo and inherited variants in SCNZA have been reported 
to cause seizure disorders, mostly mild but occasionally 
accompanied by severe intellectual disabilities including 
infantile epilepsy. 18-30 This non-synonymous SCNZA variant, 
Aspl598Gly, has a PolyPhen score of 0.99 (range 0—1), meaning 
that it is very likely to be detrimental to the protein, 31 and was 
confirmed to be de novo by Sanger sequencing. Residue Aspl598 
is located in transmembrane segment D4S3 of sodium channel 
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GAPDH blot 



Figure 1 Expression of endogenous NGLY1 protein in peripheral blood 
mononuclear cells from patient, parents and three unrelated healthy 
controls. The protein expression level in the patient is less than both parents 
and healthy controls. GAPDH, glyceraldehyde 3-phosphate dehydrogenase. 

Navl.2, within the sequence WNIFDF that is highly conserved 
in mammalian and invertebrate voltage-gated sodium channels 
(figure 3). In the bacterial sodium channel, the corresponding 
sequence is WSLFDF, and the recently determined crystal 
structure indicates that this aspartate residue (D80) can form a 
hydrogen bond with a positive (arginine) gating charge in 
transmembrane segment S4. 32 Conversion of this aspartate to 
the non-polar glycine residue would prevent this interaction, 
potentially impairing regulation of channel opening. These 
considerations strongly indicate the pathogenicity of this 
mutation. 

Further support for the role of this mutation comes from the 
closely related sodium channel SCN1A. SCNiA and SCNZA 
arose by gene duplication during vertebrate evolution, and retain 
87% amino acid sequence identity (1747/2005) with most 
divergence in non-transmembrane domains. A de novo mutation 
in the corresponding residue of SCN1A, D1608Y, was found in 
a patient with severe myoclonic epilepsy of infancy, which 
like our patient is characterised by infantile seizures and 
intellectual disability. 33 Three additional missense mutations 
in transmembrane segment D4S3 of SCN-lA have been identi- 
fied in patients with epilepsy (http://www.molgen.ua.ac.be/ 
SCNIAMutations/), further demonstrating the pathogenic 
potential of this transmembrane segment of the protein. 



SCNZA is not routinely included in DNA testing for epilepsy 
because mutations of SCN1A are much more common. 

Interesting findings 

In the remaining six cases, no variants judged as likely to be 
causal for most or all features were identified, although in two 
cases one or more interesting candidate variants were found. 

Trio 4 

Exome sequencing revealed several regions of homozygosity 
including several homozygous variants in EFEMP-1 (two intronic 
SNVs and a 3'UTR INDEL), a gene in which heterozygous 
mutations are known to cause early onset maculopathies. 34-36 
Subsequent to this finding, it was judged that the patient's 
retinal phenotype of bilateral and symmetric distribution of 
drusenoid deposits most likely reflects dysregulation of the 
function of EFEMP1 (E Heon, personal communication). A real- 
time reverse transcriptase PCR assay indicated that the level 
of EFEMP-1 expression in blood is too low to assess any effects 
of the variant on controls. This patient also carries a de novo 
non-synonymous coding SNV with a PolyPhen score of 0.999 in 
the gene ATP6APZ. This gene encodes the (pro) renin receptor 
and has multiple functions in the eye, heart, kidney, central 
nervous system and other tissues. 37-39 This patient highlights 
the fact that some subjects who would undergo NGS may very 
well have more than one underlying diagnosis, and that all 
causative variants may not be detected. 

Trio 6 

A de novo variant was observed in the 5 consensus splice site 
of exon 9 of the HNRNPU gene, which encodes HnRNP U. 
This gene is in the critical target region for the seizure pheno- 
type of patients with microdeletion of lq43— 44, 40 41 a highly 
variable syndrome characterised by speech delay, intellectual 
disability and seizures. In mice, HnRNP U has been shown to 
be linked to preaxial Polydactyly caused by abnormal expression 
of SHH during limb development, 42 and normal HnRNP 
U expression is essential for embryonic development. 43 We have 
been unable to demonstrate a functional effect of the de novo 
variant in blood, but it remains possible that it affects expression 
of a particular isoform, perhaps in a tissue-specific manner 
during development. In addition, this patient has a de novo 
mutation in SMAD1 , a gene that partners with SMAD4 in bone 
morphogenetic protein signal transduction. 44 Given the associ- 
ation of de novo SMAD4 mutations with a spontaneous clini- 
cally heterogeneous developmental disorder (see above), it is 
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possible that mutations in its close partner gene may cause 
similar pheno types. 

Interesting variants ultimately considered unlikely to be causal 

Trio 2 

A synonymous inherited X chromosome variant was found in 
the GPM6B gene, which has been considered a good candidate 
for causing cases of Pelizaeus-Merzbacher disease. 45 Since Peli- 
zaeus-Merzbacher Disease had been considered as a diagnosis for 
this patient, we tested the cDNA from the trio for possible 
effects on splicing, and the DNA from the maternal grandpar- 
ents to examine the inheritance. We found that the mutation 
had no effect on the cDNA sequence of the patient, and that it 
was inherited from the paternal grandfather. This illustrates 
the importance of tracking candidate variants through the 
relevant pedigree before reaching a judgement concerning 
pathogenicity. 

Trio 10 

This patient has an inherited X chromosome variant predicted 
(by Genie) to affect the 3' splice site of exon 8 of ACSL4, a gene 
linked to intellectual disability with absent or severely delayed 
speech and dysmorphic facial features. 47 However, cDNA 
sequencing revealed no differences between the patient and his 
parents. Combining this lack of function with a poor fit 
between the phenotype of the child and that associated with 
known mutations in the gene suggests that this variant unlikely 
to be responsible. 

For the trios with no likely or suggestive causal variant, we 
will perform whole-genome sequencing to screen for variants 
that might have been missed by whole-exome sequencing such 
as exonic variants that were not captured, or structural variants 
not identifiable from exome data. 

DISCUSSION 

This study highlights both the challenges and opportunities in the 
application of NGS to clinical diagnosis in patients with intellec- 
tual disabilities/congenital anomalies. In cases where we found 
a clear and likely cause of the condition, this conclusion depended 
on the knowledge of Mendelian diseases associated with the rele- 
vant genes. Two of these genes are already well known: TCF4 and 
SCN2A; however, the mutations we detected were novel. The 
example of EFTUDZ is of very particular interest. Before the recent 
identification of this gene, a possible case could be made on the 
basis of seeing de novo mutations in two of our patients, although 
we failed to show a functional effect of one of the two variants in 
the available tissue. Subsequent comparison of their phenotypes 
revealed a number of similarities. This example shows that 
a discovery paradigm focusing on a broad range of conditions 
provides an important complement to the more common current 
strategy of combining patients with similar conditions on strictly 
clinical criteria. By studying the genetics of a broader range of 
conditions as we did, it is possible to make a careful assessment of 
any phenotypic overlap of patients that have possible causal 
mutations in the same genes. In this way, it may be possible to 
identify conditions with broader phenotypic presentations than is 
possible in the strictly 'phenotype first' framework. However, we 
do note that confirmation that EFTUDZ is causal required its 
recent identification by Lines and colleagues. 7 It is noteworthy that 
our study of only 12 patients pointed toward the possibility of 
EFTUD2 involvement in two of the cases. If a programme such as 
was used here were applied to many hundreds and eventually 
thousands of unexplained conditions, it is very plausible that many 



new genes would be nominated and confirmed using exactly this 
strategy. 

Furthermore, information gained from genome sequencing 
as described here, focused on a broad range of patients, will likely 
expand the phenotypic spectrum of many currently well-known 
genetic disorders. Clinical decisions regarding whether or not to 
perform a genetic test largely depend on how well the patient 
fits the clinical description of the disorder. Although mutations 
in TCF4 are known to cause the well described PHS, the patient 
in this study did not exhibit two of the most common and 
differentiating symptoms of this disorder (periods of hyperven- 
tilation and seizures), and although the condition was consid- 
ered, the diagnostic yield was not thought to be high enough to 
warrant testing. Similarly, the patient with the SMAD4 muta- 
tion that is known to cause Myhre syndrome did not show 
a typical manifestation of this syndrome. It is possible that there 
are many well described genetic conditions in which the vari- 
ability in the phenotypic spectrum is not currently appreciated, 
and NGS may facilitate considerable broadening of this spec- 
trum. The real power of diagnostic sequencing will depend on 
establishing very large databases that include mutations of 
interest and corresponding phenotypes. For example, intellectual 
disabilities and/or congenital abnormalities occur in approxi- 
mately 3—4% of children, 48 49 and a majority of these are due to 
underlying genetic causes, yet close to 50% of children with one 
or both of these phenotypes remain undiagnosed. 50 51 It is likely 
that a high proportion of these undiagnosed cases will start to be 
sequenced annually in the next few years, creating the oppor- 
tunity for very large databases that will permit the identifi- 
cation of currently unrecognised genotype-phenotype 
connections. 

The suggestive finding for NGLY1 is also of particular interest. 
Rather than being a gene known to be responsible for a 
Mendelian disease with phenotypic similarity to the patient 
under study, this gene clearly acts in the same pathway as the 
known genes causing the Mendelian disorder that had been 
considered for the child, that is, a congenital disorder of glyco- 
sylation. This case illustrates how we can leverage known 
information about the function of a gene, and in particular its 
action within a pathway already implicated in Mendelian 
disease, to help identify new genetic diagnoses. 

Our work also demonstrates the importance of the use of 
'general', non-gene-specific functional evaluation of gene 
expression to confirm the pathogenicity of a variant. Since the 
de novo mutation in TCF4 had not been described before and 
involved a splice site, it made a strong but not definitive case 
for causality. Functional studies demonstrated that the mutation 
in TCF4 disrupts splicing and results in a protein targeted for 
degradation, which confirms causality. This work, therefore, 
helps to establish a general paradigm for such clinically moti- 
vated sequencing which includes not only the identification of 
candidate variants but also a generalised function evaluation of 
their impact on gene expression and splicing. However, as the 
number of sequenced patients increases, and as these data are 
increasingly shared in public databases, the need for functional 
work for some variants will decrease as the same variants are 
shown to occur in multiple patients with similar presentations 
(as for the SMAD4 variant in trio 3). 

It is also important to emphasise that the paradigm we 
adopted in this study is likely to be similar to how NGS would 
be applied in clinical genetics practice, since general genetics 
clinics would have patients with widely differing phenotypes. 
Our study demonstrates the type of patients that would be 
sequenced in these clinics and provides data regarding 
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expectations of finding a cause, the importance of functional 
assays for probable variants and the value of pre-screening 
patients to determine eligibility for NGS. With our inclusion and 
exclusion criteria, we set out to maximise the likelihood that 
an underlying undiagnosed genetic condition was present in 
each of the enrolled patients, and found causal or likely and 
interesting variants in 8/12 patients. It is also likely that in 
clinical practice, partial explanations would be detected for 
diverse manifestation in the same patient, as in trio 4 in our 
study, emphasising the complexity of genetic counselling for 
such a patient whose manifestations are likely to be due to 
more than one underlying genetic cause. Establishing a diag- 
nosis is often of value even when a clear change in treatment 
is not indicated by the diagnosis. For example, close and 
ongoing observation for seizures is now indicated for patient 5 
(TCF4), and avoidance of medications that may trigger 
seizures, such as antihistamines. 52 The family can be 
informed that the disorder is due to a de novo variant, and in 
the absence of parental mosaicism, other family members are 
not at risk, and with future pregnancies the recurrence risk for 
the parents is low. Additionally, they can learn about PHS, 
have a better idea of future expectations, and reach out for 
support from families with similarly affected children. Simi- 
larly, patient 11 (SCNZA) should avoid common anti-epileptic 
drugs whose primary mechanism is sodium channel inhibition, 
since these exacerbate symptoms in patients with SCN1A 
mutations. 53 A confirmed molecular diagnosis may also protect 
patients from incorrect diagnoses that could lead to unhelpful 
therapy options. 

While cost benefit analyses were not the focus of this work, 
it is interesting to note that some of the patients who now have 
a genetic diagnosis, underwent many genetic tests prior to 
exome sequencing at a considerable estimated cost (eg, more 
than $22 000 were spent on laboratory investigations in Trio 2) 
While estimating the real costs of exome sequencing is difficult, 
it is already clear that in some cases, interrogating genes one by 
one or in panels will rapidly lead to greater total costs than 
exome or whole-genome sequencing. While these considerations 
are encouraging, as is the success rate of six likely genetic diag- 
noses out of 12 cases (with one further case likely explained 
partially), this work was performed in a research environment 
and there will be many challenges involved in a transition to 
fully clinically based applications. Itemising those challenges, 
from cost and reimbursement to the type and manner of 
communication to the families (including the issue of incidental 
findings), is beyond the scope of this work, but we would 
highlight two challenges in particular. First, in our experience, 
laboratory-based functional analysis is an important part of the 
evaluation, and it remains unclear how this would 
be incorporated into routine clinical application of NGS, even 
as NGS is beginning to be offered by commercial laboratories 
as a clinical test. Second, this work required substantial manual 
interrogation of both sequence data and candidate genes. 
Although variant calling procedures are continually improved 
and there are likely to be routines developed to simplify the 
process of candidate identification, 54 it seems likely that for 
the foreseeable future, some level of expert judgement will 
continue to be required to identify causal mutations from 
sequence data, which will contribute to the cost and time of 
this type of diagnostics. Currently, it is difficult to imagine how 
the level of both variant inspection and functional evaluation 
could be provided as part of routine clinical diagnostic testing. 
These current essential functions, therefore, present a significant 
challenge to the use of NGS to provide genetic diagnoses. 



Finally, we note that there are a number of reasons that 
causal variants may have been missed in some trios in this study. 
One important factor is that we do not have a comprehensive 
understanding of the function of most genes. For genes whose 
function is not well characterised, extensive functional follow- 
up may be required to assign causality to a de novo or homo- 
zygous variant carried by an individual patient. We may also fail 
to detect some causal variants. Exome sequencing does not 
capture all exons, nor non-coding regulatory regions, and 
structural genomic variants such as CNVs are difficult to 
recognise. Additionally, variants within captured regions may 
be missed by the mapping/variant-calling algorithms. In the 
future, we anticipate this approach will be improved by the 
use of whole-genome sequencing and improved variant identi- 
fication, although for the foreseeable future a small proportion 
of the genome will remain refractory to high throughout 
sequencing. It is also possible that causal variant(s) may exert 
their effects through more complex inheritance patterns than 
investigated in this study. 

In summary, this work indicates that the application of 
NGS should be strongly considered in all cases where a genetic 
condition is strongly suspected but traditional clinical genetic 
testing has proven negative. Furthermore, in some cases at least, 
it is likely that NGS will prove faster and less expensive than the 
long diagnostic odyssey many families now endure. However, 
our work, like that of others, offers the cautionary note that 
it will probably be possible to identify very strong candidate 
variants in any sequenced genome and that further studies 
such as functional assays or multiple patients with mutations 
in the same gene will often be needed to establish causality. 
Considerable attention must be paid to establishing appropriate 
standards of evidence before the results of NGS are used to 
influence patient care, and establishing such standards will be 
a major challenge for NGS in the clinic. 
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